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DIAMETER AND STATIONARY DISTRIBUTION OF RANDOM r-OUT 

DIGRAPHS 

LOUIGI ADDARIO-BERRY*, BORJA BALLE^, AND GUILLEM PERARNAUt 


Abstract. Let D{n, r) be a random r-out regular directed multigraph on the set of 
vertices { 1 ,.. . ,n}. In this work, we establish that for every r > 2 , there exists Tfr > 0 
such that diam(D(n, r)) = (1 + rjr + o(l)) log^ n. Our techniques also allow us to bound 
some extremal quantities related to the stationary distribution of a simple random walk 
on D(n, r). In particular, we determine the asymptotic behaviour of Timax and TTmin, the 
maximum and the minimum values of the stationary distribution. We show that with 
high probability Timax = and TTmin = Our proof shows that the 

vertices with 7r(?;) near to TTmin lie at the top of “narrow, slippery towers”; such vertices are 
also responsible for increasing the diameter from (1 + o(l)) log^ n to (1 + 7^^ + o(l)) log^ n. 


1. Introduction 


Call a random directed graph D with vertices V{D) = {vi,... ,Vn} a random r-out 
digraph if each vertex in V{D) has out-degree r, and the nr heads of edges in E{D) are 
iid and uniformly distributed over V{D). We allow digraphs to have multiple edges and 
loops. It is useful to have a canonical construction: for each pair (z,j) € [n] x [r], let Lij 
be a uniformly random element of [n], and write D{n,d) for the random r-out digraph 
with vertex set [n] = {I,..., n} and edge set {{i, Lij) : {i,j) G [n] x [r]}. 

Given a digraph D, for u,v € V{D) we write dist(n, u) = distD(tt, u) for the number of 
edges in a shortest oriented path from u to v, or set dist(ti,r) = oo if there exists no such 
path. The diameter of D is 


diam(D) = max{dist(n,u) : u,v G [n],dist(tt,u) < oo}. (1.1) 


Say D is strongly connected if dist(u, u) < oo for all u,v G V{D). An induced subgraph 
D[5] of D is a strongly connected component of D if D[S'] is strongly connected but for 
all S' with S C S', D[S'] is not strongly connected. Given S C V{D), say that D[S] is 
attractive if for all v G V{D) there is a directed path from v to S. It is easily seen that a 
digraph can contain at most one attractive strongly connected component D[S']. 

If D is strongly connected then a simple random walk on D has a unique stationary 
distribution tt = tt/); in this case we write 7r ma x(D) = max{7r£)(u) : v G V{D)} and 
TT min (D) = min{7r£)(T;) : v G V{D)}, respectively. The diameter, and the values TT ma x and 
TTmin) are natural extremal parameters associated with a digraph. In order to study cases 
where D is not necessarily strongly connected, we write Dq = Do(n,r) for the strongly 
connected component of D{n,r) with the largest number of vertices (if there is more than 
one such component, Dq is the one whose smallest labelled vertex is minimal). 

Let \r = max{A : 1 — A = and let 


hr = 


1 

log^(I - A,,)-! - I 


logr 

A^r — log r 


( 1 . 2 ) 
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Observe that ^ 1 and r]r ^ 0 when r —>■ oo. A sequence of random variables A„ 
converges to X in probability if for every e > 0, — A| > e) ^ 0 as n ^ oo. If 

Xnfyn ^ Ai in probability then we also write Xn = (A + Op(l))y„. 

The study of D(n, r) was initiated by Grusho, who showed that the size of its largest 
strongly connected component satisfies |F(iAo(n,r))| = (1 + Op(l))Ar • n [17] (we remark 
that its size is also the asymptotic size of the giant component in the Erdos-Renyi random 
graph G{n,r/n)). Because A^ > 1/2 for all r > 2, it follows that Dq^u, r) is with high prob¬ 
ability^ the unique strongly connected component of its size. Motivated by the average- 
case analysis of algorithms for the minimization of Deterministic Finite Automata (DFA), 
Grusho’s result has been recently rediscovered by different sets of authors [6, 10, 11]. 

The diameter of D{n, r) was first studied by Trakhtenbrot and Barzdin in [32, Theorem 
5.5], who showed that for every r >2 there exists a constant > 1 such that with high 
probability diam(Zl) < Cr log^ n. Since D{n, r) is r-out regular we always have the trivial 
lower bound diam(D) > [log^ (n — 1)]. 

In [17], Grusho also showed that the unique largest strongly connected component Dq 
is attractive with high probability. More recently, Balle [4] showed that Do{n,r) is whp 
aperiodic, and so Dq is ergodic. It follows that whp, the law of the position of a particle 
performing a simple random walk on D{n,r) converges to 'KDo{n,r)- 

The contribution of this paper is to determine the first order asymptotic behaviour of 
diam(iA(n, r)), vrmax(Tlo(R, t)) and 7rmin(iAo(n, r)), as n becomes large. 

Theorem 1.1. For every r > 2, we have diam{D{n,r)) = (1 -|- 77 ^ + Op(l))log^n and 
diam{Do{n, r)) = [I + rjr + Op(l)) log^ n. 

Theorem 1.2. For every r >2, we have T^maxiDoin, r)) = and 7rjnin(Do(n, r)) = 

j^-(l+»?r)+Op(l) ^ 

Remarks. 

• The results obtained can be easily transferred to random simple r-out digraphs. 
Let iAsiin(n,r) be chosen uniformly at random from the set of directed simple 
graphs (no loops or multiple edges) with vertex set [n] such that each vertex has 
out-degree r. The conditional distribution of D(n,r), given that it is simple, is 
precisely that of Dsim{n,r). Furthermore, it is not hard to show (see [24]) that 

P(D(n,r) is simple) = ^ . 

In particular, this probability is bounded away from zero for hxed r, so any property 
that holds whp for D{n,r) also holds whp for Dsim{n,r). 

• It is not hard to deduce from our arguments that for all u,v V{D{n,r)), condi¬ 

tional on the event that v G Do(n, r), we have dist£)(„ ,.)(u, u) = (1 -|- Op(l)) log^ n. 
This shows that the typical distance in D{n,r) is (1 -|-Op(l)) log^ n. The argu¬ 
ment, in brief, is as follows. First, the lower bound is easy by symmetry since 
for all u € V{D{n,r)) we have |A^^(u)| < For the lower 

bound. Proposition 4.2 tells us that if N^[v) > log'^n then with high probability 
dist(u, u) <k + log^n. By Lemma 6.4 and Proposition 6.5, with k = (log log n)^, 
it follows straightforwardly that P(iV^(u) < log^n j v G Do(n, r)) = o(l), and the 
result follows. We leave the details to the interested reader. 

• The random r-out, s-in digraph D{n,r,s) is defined similarly to D{n,r), but each 
vertex chooses s in-neighbours as well as r out-neighbours, all independently and 

uniformly at random; see [16]. In particular, D{n,r) = D{n,r,0)‘^. It may be 

^Here and for the remainder of the paper, with high probability, or whp, means with probability tending to 
1 as n ^ 00 . 

O (J 

For any two random variables X and Y, we use the notation X = P to denote that the corresponding 
probability distributions are equal. 
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interesting to consider the diameter and the stationary distribution for D(n,r,s) 
when s 7 ^ 0. One case follows from Theorem 1.1: since the diameter of a digraph 
is the same as the diameter of the digraph obtained by flipping the direction of all 

the edges, diam(Zl(n, 0, r)) = diam(Zl(n, r, 0)). In contrast, studying the station¬ 
ary distribution of D{n,0,r) seems less interesting: typically there will be many 
vertices with no out-edges where a simple random walk will eventually become 
stuck. 

Outline. The paper is organized as follows. We start in Section 2 by discussing our 
motivation for addressing these problems and by putting our results in the context of 
other models for random (di)graphs. In Section 3 we introduce the notation that will be 
used throughout the paper and state some basic concentration inequalities and facts about 
branching processes. In Section 4 we finish the proof of the upper bound on the diameter 
of D{n,r) (Theorem 1.1) assuming some technical estimates. The breadth-hrst search 
procedure that will be used to explore the graph is described in Section 5. In Section 6 we 
study the behaviour of the in-neighbourhoods of D(n, r) by comparing them with Poisson 
Galton-Watson trees, while in Section 7 we study its out-neighbourhoods. In Section 8, we 
prove the technical estimates, completing the proof of the upper bound given in Section 4. 
The proof of the lower bound on the diameter of D{n, r) (Theorem I.l) occupies Section 9. 
We conclude the paper by proving Theorem 1.2 in Section 10. 

2. Motivation and Related Work 

One of our main motivations for the study of D{n, r) comes from the analysis of random 
deterministic finite automata (DFAs). In this section we describe the particular problem 
that leads us to study the diameter and stationary distribution of these objects. 

2.1. Learning Random Deterministic Finite Automata. A deterministic finite au¬ 
tomaton (DFA) over an alphabet S = {fii,...,0-^} is given by a set R = {vi,... ,Vn} 

and a function L : [n] x [r] —)■ [n]. We think of the pair (y,L) as specifying a directed 

multigraph D with vertices V and edges {{vi, L{i, j)) : i € [n],j G [r]}; every vertex of D 
has out-degree r, and the r edges leaving a vertex v are labeled with distinct symbols from 
S. In addition, a DFA is equipped with a distinguished vertex s called the initial state, 
and with a binary labelling B : V{D) —)■ {0,1}; the vertices in R~^({1}) are the accepting 
states of the DFA. The DFA is formally given by the tuple Q = (V, S, L, s, B). 

Let S* denote the set of all finite strings with symbols in S. Words w = wiW 2 ... G S* 

correspond to walks xo{w),xi{w),... ,xt{w) on V: xq = s and, for 1 < i < t, Xi is 
reached from Xj_i by following the edge with label Wi. We write Q{w) = xfiw) for 
the final state of the walk. The DFA accepts the word w if B{Q{w)) = 1. The set 
^{Q) = {r’ G S* : B{Q{w)) = 1} is the language recognized by the DFA. The set of 
languages recognized by some DFA are precisely the regular languages. 

To see the connection with random out-regular graphs, observe that we may build a 
uniformly random DFA with n labelled states and alphabet of size r as follows. Let D{n, r) 
be as in the first paragraph of the paper, using the random variables (Ajj : {i, j) G [re] x [r]). 
Then for {i,j) G [re] x [r], let L{i,j) = Lij); equivalently, assign label cjj to edge {i,Lij). 
Choose the starting state s uniformly at random from [re], and choose B uniformly at 
random from the set of functions / : [re] —>■ {0,1}. 

DFAs and regular languages play a crucial role in language theory and there is a vast 
literature on algorithms over DFAs, ranging from minimization and equivalence testing, 
to synthesis, learning and composition. 

Learning regular languages from different sources of information is a prominent problem 
in computational learning theory [22] , which is most often studied within the context of so- 
called grammatical inference problems [15]. A prominent problem in this area concerns the 


4 


DIAMETER AND STATIONARY DISTRIBUTION OF RANDOM R-OUT DIGRAPHS 


possibility of learning regular languages under the probably approximately correct (PAC) 
learning model introduced by Valiant [33]. Roughly speaking, this asks for an efficient 
algorithm such that, when supplied with a large enough sample containing iid strings drawn 
from some arbitrary probability distribution // on S* and labels indicating whether each 
string belongs to some hidden regular language, the algorithm outputs a representation of a 
regular language (e.g. a DFA) which is close to the hidden regular language in a sense that 
depends on the distribution which generated the sample strings. Several results from the 
90’s indicate that, in its full generality, PAC learning of DFAs is hard due to complexity- 
theoretic as well as cryptographic reasons [21, 28] (see also the recent strengthened result 
[12]). A natural question to ask in such a scenario is whether there exists a reasonable 
simplification of the problem for which a positive answer is possible. This requires one 
to come up with scenarios that rule out the worst-case problems arising from specially 
crafted regular languages and distributions over examples appearing in the proofs of the 
aforementioned lower bounds. 

One possibility is to study the average case. This approach can be formalized by con¬ 
sidering regular languages defined by random DFAs. In particular, one can ask for an 
algorithm that with high probability (as the number of the states in the DFA goes to 
inhnity) can learn the regular language recognized by a random DFA. There exists evi¬ 
dence suggesting that such relaxation might not be enough to achieve efficient learning 
in general: it was recently showed by Angluin et al. that generic instances of DFA (as 
well as decision trees and DNF formulas) are hard to learn from statistical queries when 
examples can be sampled from an arbitrary distribution [3]. Nevertheless, prior to Angluin 
et al.’s result it was showed that generic decision trees and generic DNF formulas can be 
efficiently learned when samples are drawn according to the uniform distribution [18, 30]. 

In view of the panorama described in the previous paragraphs, a natural question to 
ask is whether random DFAs can be efficiently learned when sample strings are drawn 
from the uniform distribution. More precisely, one would like to answer the following sorts 
of questions. Fix a uniformly random DFA Q with states [n] and alphabet [r]. Then fix 
m E N and let (xj,l > 1) be iid words sampled uniformly at random from [r]™. 

(1) Given the sequences (xj,l > 1) and (R((5(xj)),i > 1), is it possible to construct a 
DFA Q that recognizes the same language as Q with high probability? 

(2) Given the sequences (x,,! > 1), ((5(xj)),i > 1) and (R((5(xj)),i > 1), is it pos¬ 
sible to construct a DFA Q that recognizes the same language as Q with high 
probability? 

In both cases, if the answer is yes then it is natural to ask for efficient algorithms (aver¬ 
age case running time polynomial in n, m, r, and any other parameters involved). The 
questions can be weakened by only requiring that Q recognizes the same set of words of 
length m. A further weakening is to only require that P(Q(y) = Q(y)) > 1 — e when y is 
uniformly distributed over [r]”^. 

The results in [1] establish that in order to answer the second question, it would be suf¬ 
ficient to understand several specific properties of a random walk on a randomly generated 
DFA. When a string is sampled from the uniform distribution over [r]”^ and is labeled ac¬ 
cording to the state that it reaches, the label immediately corresponds to the final state of 
a simple random walk of length m over the DFA starting from the initial state. Thus, the 
analysis of the algorithm in [1] relies on bounds on the diameter, stationary distribution, 
and mixing time on random r-out regular digraphs. Similar ideas are what led us to the 
study of the problems discussed in the present paper. 

Several other properties of random DFAs have been studied, both in learning theory and 
in other contexts, using the D(n, r) model. For example, first Korshunov’s group, and later 
Nicaud’s group, have studied the probability that random DFA exhibit particular struc¬ 
tures, mainly motivated by the analysis of sample and reject algorithms for enumeration 
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of subclasses of automata (see [25] and references therein). Motivated by worst-case hard¬ 
ness results for learning a DFA, Angluin and co-authors have used properties of random 
DFAs to study the problem of learning a generic DFA [2, 3]. The average-case complexity 
of DFA minimization algorithms has also received some attention recently [5, 14]. Fi¬ 
nally, a series of results have led to a solution of the long-standing Cerny conjecture about 
synchronization of finite automata in the case of random DFA [7, 26, 31]. 

2.2. Diameter and stationary distribution of other random graph models. In 

this subsection we describe some previous results on the diameter and the stationary 
distribution of certain random graph models and relate them to Theorem 1.1 and to 
Theorem 1.2. This provides an intuition for the results we have obtained on the diameter 
of D(n,r). We consider the following models of random (di)graphs. 

• For p € [0,1), G{n,p) is the random graph with vertex set [n] in which every edge 
is included independently with probability p. 

• For d € N, G(n, d) is the random d-regular simple graph with vertex set [n] chosen 
uniformly at random among all such graphs. 

• For p € [0,1), D{n,p) is the random digraph with vertex set [n] in which every 
oriented edge is included independently with probability p. 

For an undirected graph G = (y,E) and u,v G V we write distG'(u,u) for the minimum 
number of edges in a path from u to v, or set distG'(u, u) = oo if there exists no such path. 
The diameter of G is then defined just as in (1.1). Bollobas and Fernandez de la Vega [9] 
studied the diameter of G{n, d) and showed that for every integer r > 2, we have 

diam(G(n, r 1)) = (1 -h Op(l)) log^ n . (2.1) 

The diameter of G{n,p) was recently studied by Riordan and Wormald [29], who showed 
that for every constant r > 0, we have 

diam(G'(re, (r l)/n)) = (1 -h 2r]r + Op(l)) log^ n . (2.2) 

In fact, they proved a stronger result, showing convergence in distribution of the diam¬ 
eter after appropriate recentering and rescaling. The extra term 2r]r is essentially due 
to the existence of “remote” vertices in the giant component of G{n,{r -|- l)/n), whose 
neighbourhoods are exceptionally small up to distance about r]r log^ n. 

Our result on the diameter of D{n, r) from Theorem 1.1 can be related to (2.1) and (2.2) 
in the following way. Given u,v £ [n] , one way to determine dist£)(„,,) (u, v) is to perform an 
outward breadth-first search (BFS) starting at u, to perform an inward BFS (i.e. following 
edges from head to tail) starting at v, and to stop at the first time the two searches uncover 
a common vertex. (See Section 5.1 for a careful definition of breadth-first search.) This 
technique was used by Bollobas and Fernandez de la Vega in [9] . Since the BFS explores 
vertices in order of distance, such a procedure is guaranteed to build a shortest path from 
u to V. 

On the one hand, in the outward BFS of D{n, r) starting from tt, every vertex has 
exactly r out-edges when explored. Similarly, in a BFS exploration of G{n, r -|- 1), when a 
vertex v is discovered via an edge from one of its neighbours, this leaves r edges to unveil 
when V is itself explored (unless v is discovered multiple times, which at least at the start 
of the BFS is unlikely). Thus, a BFS of G{n,r + 1) looks similar to an outward BFS of 
D{n, r). 

On the other hand, in the inward BFS of D{n, r) starting from v (or at least near the 
start of the process) the number of in-edges arriving at a vertex are roughly distributed 
as a Binomial random variable with n trials and success probability r/n. Thus, a BFS of 
G(n, (r -|- l)/n) looks similar to an inward BFS of D{n,r). 

The preceding paragraphs suggest that shortest paths in D{n,r) are in some sense hy¬ 
brids of shortest paths in G{n,r + 1) and in G{n, (r -|- l)/n). This, together with (2.1) 
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and (2.2), provides some intuition for the value of the diameter of D(n,r) from Theo¬ 
rem 1.1: it is the average of the limit values in those formulae. 

There is interesting related work on distances in graphs with random edge weights. 
We mention in particular the paper of Janson [20] on typical and extreme distances in 
randomly edge-weighted complete graphs, and the subsequent work by Bhamidi and van 
der Hofstad [8], which establishes distributional convergence for the diameter. 

To conclude this section, we discuss the stationary distribution of a simple random 
walk in these other models. While in undirected graphs the stationary distribution (if 
it exists) is completely determined by the degrees of the vertices, this is not the case in 
directed graphs. Cooper and Frieze [13] give a very precise description of the stationary 
distribution of D{n,c/n) when c = c(n) > (1 -|- e)logn, for any constant e > 0, and use 
their result to compute the cover time of D{n,c/n). It is worth noticing that for such 
values of c, both the in-degrees and out-degrees are of logarithmic order and concentrated 
around their expected values, which turns to be very useful for the analysis. It seems 
harder to find an interesting question about the stationary distribution of D(n, c/n) when 
c = c(n) < (1 — e)logn since like in random r-in regular digraphs, typically there are 
vertices with no out-edges. 


3. Notation and preliminaries 

We write [n] = {1, 2,... , n}, N = {1, 2,...}, and No = {0,1, 2,...}. The notation A C B 
allows that A = B; we write A C B for strict containment. Unless we explicitly indicate 
otherwise, asymptotic notations will always refer to the case n —>■ oo. We omit floors and 
ceilings when doing so improves readability. All logarithms are natural unless a subscript 
specifies otherwise. 

For any two random variables X and Y, we use the notation X = Y to denote that the 
corresponding probability distributions are equal. For random variables X,Y, we write 
X < Y^ and say X is stochastically dominated by T, if P(X < t) > P(y < t) for all 
t € M. We say Xi,..., X/^ are independently stochastically dominated by Yi,...,Yt if 
P(-Aj <ti,l<i<k)> ^ 

3.1. Digraphs. Let D = {V{D),E{D)) be a directed graph. For S,S' C V{D) let 

E{S,S') = Ed{S,S') = {{u,v) eE\ueS,ve 5'} . 

Given S C [n], D[5'] = {S, E{S, S)) is the subgraph of D induced by S. 

Given u € [n] and an integer /c > 0, write N^{D,u) = {u E [n] : dist(u, u) = k} 
and N^j^{D,u) = Uj<fcAG"(n). Similarly, let Njf{D,u) = {u E [n] : dist(u,n) = k} and 
N^i^{D,u) = L}j<kX~(u). We write N~^{D,u) = N^{D,u) and N~{D,u) = N^{D,u). 
We also let d'^{D,u) = \N^{D,u)\, and define d~^i.{D,u), d^{D,u), and df^^{D,u) corre¬ 
spondingly. We write N^{u) = N(o , u), etcetera, when D is clear from context. 

3.2. Concentration Inequalities. We write Bin(N,p) to denote a Binomial random 
variable with N trials and success probability p. We write Po(r) to denote a Poisson 
random variables with parameter r. We also write Ber(p) to denote a Bernoulli random 
variable with success probability p. 

We will use the following version of Chernoff’s bound for large deviations that can be 
found in [19]. 

Lemma 3.1 (Ghernoff’s inequality). For any t >0 we have 

P 

P(Bin(A',p) > Np-|-f) < e ^(Np+t/s) ^ (3.1) 

and 

P(Bin(A^,p) < Np — t) < e . 


(3.2) 



DIAMETER AND STATIONARY DISTRIBUTION OF RANDOM r-OUT DIGRAPHS 


7 


We will also use Chebyshev’s inequality: for any random variable X and any t > 0, 
P(|X - IE(X)| >t)<^, where = E(X2) - E(X)2. 

3.3. Trees and branching processes. In any rooted tree, we view edges as oriented 
from child to parent. Fix a rooted tree T with root v = v{T). Then for all u € V(T), 
N~{T,u) is the set of children of u. For u ^ v, let p{u) = priu) be the parent of 
u in T, so N~^{T,u) = {p{u)}. For A: > 0, let T<k be the subtree of T induced by 
N^j^{T,v) = {u G ViT) : distrCrt, u) < A:}; we view T<k as rooted at v. Also, write 
Tk = N^{T,v) = {uG V{T) : distT(u,u) = A:}. 

A plane tree is a rooted tree in which the children of each node have a left-to-right 
order. Given a plane tree T, there is a canonical labelling of V{T) by distinct elements 
of {0} U IJi>i as follows. The root v has label 0; its children are labelled from left to 
right as 1,..., |A^“(r,u)|. Given u € V{Tk) with label wiW 2 ■ ■ ■ Wk, the children of u are 
labelled from left to right as (rci ... Wki, I < i < \N~{T, n)|). 

Gonversely, given a rooted tree T with t vertices and an ordering of V (T) as vi,... ,vt, 
say w G V{T) has index j w = vj, for 1 < j < t. We view V{T) as a plane tree using 
the convention that the children of each vertex are listed from left to right in increasing 
order of index. If V(T) C N then we always use the ordering inherited from N. Thus, for 
a rooted tree T with V{T) C N, and a plane tree T', we say T and T' are isomorphic, and 
write T = T\ if T and T' are identical when viewed as plane trees. 

Finally, hx a non-negative, integer-valued random variable A Galton- Watson tree with 
branching mechanism ^ is the random, potentially infinite family tree of a branching 
process started from a single individual, in which each individual reproduces independently 
according to ^ (i.e. the number of offspring of each individual has the distribution of ^). 
The random tree is naturally viewed as a plane tree; see [23] for details and a careful 
construction. If ^ is Po(r) distributed we call a Poisson(r) Galton-Watson tree. 

4. Theorem 1.1: upper bound 

In this section we describe our proof technique for the upper bound in Theorem 1.1, 
and prove the theorem assuming two technical estimates. For the remainder of the section 
let D = D{n,r) and write d'^{v) = d'J^{D,v), Nj^{v) = N^{D,v) etcetera. 

In order to derive an upper bound on the diameter of D we first show that for any fixed 
vertex v, with high probability the in-neighbourhood (v) is either empty or large for k 
slightly larger than log^ n. 

Lemma 4.1. Fix v G [n], let ko{v) = min{A: : d^{v) 0 (0,log'^n)}. Then for every 

e G (0,1/10), there exists (i > 0 such that 

P (ko{v) > {pr + e) log^ n or > log^n^ = . 

Next, if u G [n] has N~^{v) = {u} (i.e., if all edges leaving v are self-loops) then call 
V a loop vertex. Let FIsl be the event that D contains some loop vertex. Each vertex is 
independently a loop vertex with probability so 

P(Esl) = 1 - (1 - n-T = • (4-1) 

Note that if r > 3, the probability of a given vertex being a loop vertex is 0{n~^). 
This bound is small enough that it would allow union bounds over pairs of vertices, which 
would simplify some proofs. Since we aim to prove our result also in the case r = 2, we 
need to be a bit more careful in our computations. 

Proposition 4.2. Fix k,n G N and u,v G [n]. Let Ek = {d^{v) > log^n,d^^(u) < 
log'^n}, and fix a graph H with v G V{H) G [n] such that P(i4[A^^^(u)] = Ff, Ei^, Esl) > 0. 
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Then 

f‘{dist{u,N^^{v)) > log^n - log^ log^ n, ^^ 51 , | D[N^^{v)] = H) = 0{n~^), 

the preceding bound holding uniformly over k and over all H satisfying the above conditions. 

We prove Proposition 4.2 in Section 8 . In the remainder of the section, we finish the 
prove of the upper bound from Theorem 1.1, assuming Lemma 4.1 and Proposition 4.2. 

Proof of the upper bound in Theorem 1.1. Fix e > 0. We show that P(diam(i4) < (1+7/^ + 
e)log^n) = 1 — 0 ( 1 ). Since diam(i4o(u, r)) < diam(i4(n,r)), the same bound immediately 
holds for Do{n, r). 

Let k* = {r]r + e) log^ n and let i* = log^ n — log^ log^ n. By (4.1) for every r > 2, we 
have that ¥{Esl) = 0{n~^). So 

P(3u,u G [ra], dist(u,u) G {k* +£*, 00 )) 

<0{n~^) + ^ P(3u G [n], dist(u, v) G {k* + £*, 00 ), Esl) • 

ve[n] 

Define ko = ko{v) as in Lemma 4.1, and let E = {/cq < k*,dfj^^{v) < log^n}. Then by 
Lemma 4.1, 

P(3u G [n], dist(u, u) G {k* + t, 00 ), E^if) 

<P(:F,:^)+P(3uG [n], dist(n,u) G {k* + t, 00 ), E,E^) 

+ P(3u G [n], dist(u, v) G {k* + i*, 00 ), E, F^sl) , 

Now let Ti be the set of graphs H with v e V(H) such that (u)] = H, E, Esl) > 0. 

Then 

P(3u G [ra], dist(u, u) G {k* +£*, 00 ), E, Esl) 

< sup P(3u G [ra], dist(u,u) G {k* +i*,oo),EsL \ D[Nfl, (u)] = H) 

Hen - ° 

< sup P(dist(u,u) G {k* +1 ,oo),Esl \ D[N~,{v)] = H) . 

H&n 

■ue[n] 

For each H ^ TL there is a (non-random) constant k = k{H) such that if D[N^j^^{v)] = 
H then ko = k{H), so the events D[A^<^^(u)] = H and D[N^j^{v)] = H are identical. 
Furthermore, given that D[N^f^(v)] = H we either have d^{v) = 0 or d^{v) > log^n. In 
the latter, if D[N^j^{v)] = H then E^ occurs, so P(D[A^^^(u)] = Ed, E^, Esl) > 0 , so we 
can apply Proposition 4.2 (with k = k{]T) = ko). In both cases we obtain 

P(dist(u,u) G (r +r,cx)),:^ | D[N^^yv)] = H) 

=p(dist(u, 7 ;) G (r +r,oo),:^ | D[iV<^(u)] = H) 

=0{n-y, 

and conclude that 

P(3n, u G [n], dist(u, v) G {k* + £*, 00 )) 

=0{n~^) + 0{n~^) = 0{n~^). 

u€[n] u,v^[n] 

It follows that with high probability diam(D) < k* + £* < {1 + r]r + e) log^ n. □ 
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5. Breadth-first search and conditioning 

In this section we describe the breadth-first search (BFS) procedures, which are funda¬ 
mental to our analysis, and use them to prove a handful of stochastic domination results 
for neighbourhood sizes in D(n,r). 

5.1. Outward and inward breadth-first search. Fix a digraph D together with an 

ordering of its vertices V{D) as (ui,...,Un). The outward breadth-first search (oBFS) 
starting from node v € F(T>) is a deterministic process {{Rf {D,v), {D,v)),i > 0), 

defined as follows. At time i, Rf = Rf{D,v) is the set of explored vertices and = 
S+(D,») is the sequence of discovered but not yet explored vertices; S is treated as a 
first-in first-out queue. Node w G V{D) has index j w = Vj. 

Begin with Rq = 0 and = (u). Now fix i > 0 and suppose {Rf,Si^) are already 
defined. Step i of the process is defined as follows. If ..., Sjj) has positive 

length then write uf = uf{D,v) = and Ci^{D,v) = N^{D,uf) \ {Rf U Si^). List 
the elements of Cf{D,v) in increasing order of index as ... ,Wiy, it is possible that 
k = 0. Then set 

Rf+i = Rf U {si,!}, and = (si, 2 , ■ ■ .. .,Wi^k)- 

In words, at step i, uf = Sjq is explored, and Wi^i,... ,Wi^k are discovered and added to 
the back of the queue for later exploration. If Sf has zero length (i.e., Sf = ()), then 
= 5+ and i ?+1 = Rf. 

Writing = i^{D,v) = min{f : = 5"^^}, then Rf^{D,v) is precisely the set 

of vertices w with dist£)(u,u;) < oo. The oBFS tree T^{D,v) has root v and vertices 
iij+(T),u); the children of are precisely the vertices ... ,Wi^k newly discovered in 
step i. We write T^{D, v, m) for the subtree of T~^{D, v) with vertices Rf^{D, v)USf^{D, v). 
Note that if Sif{D,v) has length I then its elements are precisely u), 0 <i<i), 

because oBFS explores these vertices before any others. We therefore have Rf^{D,v) U 
Sf^{D, v) = {uf {D, u), 0 < z < m -I- £}, 

In the inward breadth-first search (iBFS) process {{R~{D,v), Sfi {D,v)),i > 0), the 
sets C~{D,v) and the terminal time i~ = i~{D,v) are defined in just the same manner 
but exploring in-neighbour hoods rather than out-neighbourhoods to discover vertices; in 
particular R~_{D,v) = {tc : dist£)(r(;,u) < oo}. We also write T~{D,v,m) for the subtree 
of T~{D,v) with vertices Rfii{D,v) U Sfi^{D,v). 

Observe that using the notation from Section 3.3, we have Tf{{D,v) = Nf'{D,v) and 
Tfi {D, v) = Nfi {D, v) for all k. 

5.2. Conditioning on neighbonrhoods and the BFS exploration in D{n,r). We 
next describe the effect of iBFS on the law of D(n,r). For the remainder of the section 
we write D = D{n,r) and fix u G [n]. We write N~ = Nfi{D,v), uj = u~{D,v), 
T~{m) = T~ etcetera. Informally, the point of this section may be summarized 
as follows: if vertex u~ is discovered at step j then all we know about u~ is that it has 
an edge to u~ and has no edges to Uk for k < j. We now state and prove some useful 
stochastic identities and inequalities which result from this. 

Lemma 5.1. Fix m G Nq. Conditional on {{R~, Sfi),0 < i < m), independently for all 
w G [re] we have 
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if w ^ U 

l + Binfr-l,i^') ^ \E{w, i?" U g-)| ^ 1 + Bin fr - 1, 

\ n — m J \ n 

i/ 1 /; E U S" \ {i;}, and \E{v, R- U 5^)| = Bin (^r, . 


Proof. Recall the canonical construction of Z? = D{n,r) from the introduction, and for 
each x,y € [n] let p{x, y) = min {(7 : Lx^q = y}. Then p{x, y) < r precisely if there is a copy 
of the oriented edge xy in D{n,r), and otherwise p{x,y) = oo. 

Now fix tt) E [n]. Suppose w 0 ii“ U 5“ ; then there are no edges from w to In 
other words, for each 1 < j < r, we have ^ Zi“ = {u~,0 < i < m}. It follows that 
the conditional law of \E{w,Ri^ U S“)| given {Rf,0 < i < m) and (*S'“,0 < i < m) \s 
Bin(r, |5“|/(n — |i?“|)). The result follows in this case since |Z?“| = m. 

Now suppose w E U 5“ \ {u}; then the parent PT-{m){w) lies in Z?“ so satisfies 
PT-{m){'^) = for some 0 < j < m. We have PT-{m){'^) = precisely if w has no edges 
to {u~ : 0 < i < j} but has an edge to uj; equivalently, p{w, ui[) = oo for each 0 < i < j, 
and p{w^u~) = k for some 1 < A: < r. The heads of the first {k — 1) out-edges from w 
are then uniformly distributed over [re] \ {u~ : 0 < i < j}, and the heads of the r — k last 
out-edges from w are uniformly distributed over [re] \ {u~ : 0 < i < j}. 

The index j is determined by {{R~, S~), 0 < i < m). Given that w E U 5“ \ {u}, 
we thus have 


1+Bin - 1, ^ I+Bin - 1, 


\^m *^ml j 

n-{j + l) J V n-j 

Since 0 < j < m, and IR^I ~ ^re, the second claim follows. The argument when w = v 
is similar but easier. Finally, the independence asserted by the lemma follows from the 
independence of the random variables : {w,p) E [re] x [r]). □ 


For the next corollary, recall that d^- = \d^<j\ = ^^)l- 


Corollary 5.2. Fix j E Nq. Conditional on {N^ ,0 < i < j), independently for all w E [re] 
we have \E{w,N^j)\ = Bin(r, (i<^/re) if w = v, |£'(tc, iV<^)| = Bin(r,d“/(re - d<j_i)) if 

W 0 


1 + Bin I r — 1, 


d: 


n — d. 




^ \E(w, iV< ■)[ ^1-1- Bin [ r — 1, 

—' \ re 


ifwG N^- \ {u}. 


Proof. Apply Lemma 5.1 at time m = 


□ 


Corollary 5.3. For all j, q,p & No, given that dj = q and d^- = p, = Bin(re — p, 1 — 
(1 - q/{n - p)Y) and ^ Bin [r{n - p), • 


Proof. If tt) E [R]\A'<j then E{w, = E{w, A(“). By Corollary 5.2, the number of edges 
from w to A(E then has conditional law Bin(r, g/(re —p)), so is non-zero with probability 
1 — (1 — g/(re — p))'’. The distributional identity follows since |[re] \ = n — d'^j = n — p. 

Next, note that Bin(m, 1 — (1 — xY) is stochastically dominated by Bin(rm,x). To 
see this, observe that the former is the number of columns containing at least one 1 in an 
r X m matrix whose entries are iid Ber(x) random variables, while the latter is the law 
of the number of ones in such a matrix. The pigeonhole principle then yields the second 
claim of the lemma. □ 
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Recall that {D, v) = N {D, u- ) \ {R- U S- ) is the set of vertices discovered at step 
i of iBFS. 

Corollary 5.4. Fixi,q € {0,1,... ,n}. Conditioned on = q, we have 

\Ci{D,v)\ =Bin^n-i-g, 1- ^ • 

We omit the proof since it is very similar to those given above. 

The next lemma formalizes the intuitively clear picture that it is unlikely for the early 
stages of iBFS to encounter a fixed, small subgraph of D not containing the starting vertex. 

Lemma 5.5. Fix a digraph G with V{G) C [n] and v ^ C(G). Fix s G N and let 
io = inf{i : \R~ U S'j”| > s}. Then 

•’((AYius.Yi)n v(a) + 01 D|i/(G)i = G) < ■ 

Proof. Since \R~ | = i, we clearly have io < s + 1. Writing r = infjf : {R~ U S~) fl V(G) 

0}, the probability we aim to bound is thus at most 

P(r < io I D[V{G)] = G) = ^P(r = i,zo > i I D[V{G)] = G) 

i=0 

S 

< P(r = i \ io > i,T > i,D\V (G)] = G) 

i=0 

Given that ^[^(G)] = G, there are r\V{G)\ — |ill(G)| edges from V{G) to [n] \ V{G); 
the heads of such edges are uniformly distributed over [n] \ V{G). For i > 0, given 
that {Rf_i U Sf_i) n V{G) = 0, the heads of these edges are uniformly distributed over 
[n]\(R(G)UR-iU5-i). Thus, 

P((Rr u S-) n V{G) / 0 I RTi^,S-_„D[V{G)] = G, (R" ^ U Sfi,) n V{G) = 0) 

^ r\V{G)\ 

-n-|R(G)|-|R-,U5rJ’ 

since, in this situation, the only way that {RJ U S~) fl V{G) 7 ^ 0 is in the case that some 
edge with tail in V{G) has as a head the vertex 

Given that io > i and t > i we indeed have {Rf_i U S'jli) H V{G) = 0, and also have 
\R~ U S’”! < s, so P(r = i\io> i,T >i, ^[^(G)] = G) < r\V{G)\/{n — |R(G)| — s). Using 
this bound in the above sum, the result follows. □ 

Finally, we require the following, rather simple result for oBFS. 

Lemma 5.6. Fix m G Nq. Conditional on R+ and S^, independently for all w G [n] \ Rfn 
we have 

\E{w, i?+ U 5+)| ^ Bin(r, (|R+| + |5+|)/n) ^ Bin(r, (rm + l)/n). 

Proof. We omit the proof of the first inequality, which parallels that of Lemma 5.1. For 
the second, note that \R^ U < rm + 1 since D is r-out regular. □ 

6. IN-NEIGHBOURHOODS: TECHNICAL LEMMAS 

In this section we gather a few basic estimates that describe the size and structure of 
in-neighbourhoods of vertices va. D = D{n,r). 

The following result controls the growth of the in-neighbourhoods in D{n,r). 





12 


DIAMETER AND STATIONARY DISTRIBUTION OF RANDOM R-OUT DIGRAPHS 


Proposition 6.1. For all a > 0, 

P G [n], 3j > 0 such that d~{v) > (r + a)^ logl nj = 0{n~‘^) . 

Proof. Fix i; G [n]. We prove that 

P (3j > 0 such that dj{v) > (r + aY log^ nj = 0{n ~^); (6.1) 

a union bound over v G [n] then proves the proposition. 

For j > 0, let Ej = {d~ {v) < (r + aY log^ n}. Then 

n 

P (3j > 0 such that dj{v) > (r + aY log^ = P ^ ^{Ej \ Hji^jEj/) . 

(6.2) 

By Corollary 5.3, for every p > q and every a > 0, 

^{dj (-y) > a I dJ_Yv) = q, d<j_i(u) = p) < P ^Bin (^r(n - p), ^ • 

Note that r{n—p)- ^Y^^_^_^ < rq. Set o = {r+aY log^ n, and observe that if F'j-i occurs then 

dj_i{v) = q < qo '.= {r + aY~^ log^ n. Finally, for such q we have a = {r + a)qo > {r + a)q, 
so 

P(£'j I Dji^^jEji) = F{d~{v) > a \ r\j/<^jEji) 

< sup F{dJ (u) > a I dJ_Yv) = q, d<j_i{v) = p) 

<i<qo,p 

< sup F{dJ {v) >rq + aq\ dJ_Yv) = q, d^j_Yv) = p) 

<?<90,P 

2 2 

< sup e~2(r+a/3)9 = n) _ ^ 

<i<go 

where we used the Chernoff bound (3.1). Using this bound in (6.2) proves (6.1). □ 

The next lemma controls the probability that the sequence {d~Y{v),k > 1) exhibits a 
large decrease in value for relatively small values of k. 

Lemma 6.2. Fix u G [n]. Uniformly in k < 0.991og^n and lo > log^ n, we have 

^ d^iv) <u}) = 0{n~^). 

Proof. Fix k and u as above, and u G [n]. Let r = min{j : d~{v) > uY/k\. If d<fc(u) > uY 
and d^iv) < uj, then t < k, so 

P(d<fc(^^) > 4(u) < w) = P(d<fc(u) > d^(v) <uj,T <k) 

k-l 

= Yuj‘^,dY(v) <UJ,T = j) 

i=i 

fe-i 

<'^F(d~(v} > Ld^/k,dY(v) < u>) . (6.3) 

i=i 

Let a = min{i > r : d~^^(v) < dflv)}. For any j < k, if d~{v) > lY/ k and d^{v) < u, 
then a < k, so 


F{dj (v) > oj‘^/k,dj^ (v) < oj) < F{dj (v) > ui‘^/k, a < k) . 


(6.4) 
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Now fix a > 0 small enough that (r + a)^ log^ n < for n large; this is possible by our 

choice of k. Also, for £ G N let Ef = {Vi < i, d^{v) < (r+ «)* log^ n}, and let E = n£>i 
By Proposition 6.1, we have = 0{n~^), so for all 1 < j < A; — 1, 

¥{d~{v) > uJ^/k, a < k) < F{E) +F{dJ{v) > uj'^jk, a < k, E) 

k-l 

< 0{n-^) + ^P((i-(u) > u^/k, a = i, E) 

e=j 

k-l 

< 0{n~'^)+ '^F{dj{v) > /k, dj'_^_^{v) < dj(v), Eg) 

i=j 

k-l 

< 0(n“^) + < dj{v) I d7(u) > ui'^/k, Eg) . (6.5) 

e=j 


On Eg we have d')^{v) < (r + log^ n < (r + a)^ log^ n < 

Now fix 0 < O' < p < and let X be distributed as Bin — p, 1 — ^1 — ^-p+q 

Using that (1 — x)* < 1 — ix + for x G (0,1) and i G N, for we have 

iiy ' 


n — p + q n — p + q 



= rq ■ 


{n — p){n — p — ^^q) 
(n — p + q)'^ 


the last inequality for n large since p < Now write qo = uP'/k and po = (x + 

logj, n < By Corollary 5.3 and the Chernoff bound (3.2), we have 

^{dg^iiv) < d'){v) I d'){v) > up/k, Eg) 

< sup F{dJ_^_^{v) < dj (u) I dj (u) = q, d)^{v) = p) 

<I0<IJ<P<P0 

< sup F{dJ^^{v) <q\d'){v) = q,d)f,{v) = p) 

<J0<IJ<P<P0 

< sup F{X < q) 
qo<q<p<po 

< sup P (A < E(X) — (2r/3 — l)g) 

<J0<IJ<P<P0 

(2r-/3-l)^q^ TO 

< sup e 2(r(j+{2r/3-l)9/3) < g 187-+2 ^ 

<J0<IJ<P<P0 


in the last line using that r >2. Finally, go = oj'^/k > log^n, so e 'Jo/(i 8 r+ 2 ) _ Ay 
Combining the preceding inequality with (6.3), (6.4) and (6.5) yields 


k—1 k—1 


P(d<fc(u) > uP,dj^ (v) <uj)< ^ ^lP((^£+i(x) < df, {v) I df, (u) > uj^/k, E) + 0{kn ^) 

j=ii=j 

< 0{Pn~‘^) = 0(n~^). 


□ 


The next lemma compares the law of T (v) to that of a Poisson(r) Galton-Watson tree. 
A similar result, in the setting of undirected graphs, can be found in [29, Lemma 2 . 2 ]. 

Lemma 6.3. Let r > 2 and let T' he a v-rooted plane directed tree where all edges point 
to the root. Suppose that |U(T')| < n/2. Then, for any k>0 we have 

F{T-^{D,v) ^ T) = e®(l^(^')l"/^)p(r<fc = T) . 
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where T is a Galton-Watson branching tree whose offspring is Poisson with parameter r. 


Proof. Fix A: € No and a plane tree T' of height at most k, and write t = \V(T')\. Recall 
the canonical labelling of V(T') with labels from 0 U IJi>i introduced in Section 3.3. 
Consider the iBFS procedure on T' started at its root v. To make sense of this, we must 
specify the order in which the children of a vertex u are added to the set of discovered 
vertices. We use the left-to-right order: so if u is explored at step i (i.e. u~{T',v) = u) 
then the rightmost child of u is the last element of Sf{T',v). 

Let Oi = |C'“(r',i;)| be the number of children of u~{T',v), and let s = |R(T'<^_^)| be 
the number of vertices of T' at distance at most k — 1 from the root. In order to check 
if T<k{D,v) and T' are isomorphic, it suffices to perform s steps of the iBFS exploration 
from V va. D. We then have 


nTfk{D,v) ^ T') = P [\Cf{D,v)\ =ai,t)<i<s) 

s-l 

= ll¥{\Cr{D,v)\=a,\\CfiD,v)\ 

i=0 

= np[|Cr(Au)|=a, 1|5-(A^^)I 

i=0 \ 


a 7,0 < j < i 


i=o / 


where the last line is due to the symmetry of the model. 

Writing g* = 1 + “ 1)) by Corollary 5.4 we then have 


IPdC** iD,v)\ = at I \Si {D,v)\ = Qi) 


n-i- Qi 

di 


1 - 1 - 


n — I 


r\ ai 


1 - 


1 


r{n-qi-ai-i) 


{n-i- qiT^ ( 

af. 


n — i 


a-i 


1 - 


n — i ^ 

^ \ r{n-qi-ai-i) 


n — I 


Now let T be a Poisson(r) Galton-Watson tree; write p for the root of T. Build T via 
iBFS starting from p. In this manner, we may couple T with a sequence > 0) of iid 
Po(r) random variables so that for 0 < i < |R(T)| we have |C'“(T,p)| = f,i. It follows 
that 

S — 1 S—1 

¥{T<k = T') = p e* = a*) = n = n ■ 

i=0 i=0 

Using that 1 -|- x < e*, this gives 


F{\C-{D,v)\ = a,\\Sf{D,v)\ 
P(^i = ai) 


M (n-i - 

(n — i)“» 




r(n-qi-ai-i) 


e-®' 


Since i + qi <t < n/2, s <t, we have 

qi + af ^ t + aj ^ ^ 

^ n — i — qi ~ ^ n/2 v n / 

It follows that 

P(r-,(Ax) = a, I |gr(D,7;)| = g,) oifin) 

P(r<fc = T') P(Ci = af) 
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Lemma 6.4 ([29], Lemma 2.1). Let T he a Poisson{r) Galton-Watson tree. There exist 
constants c,C > 0 such that for every oj >2 and k > 1 we have 

c ■ mm{(r(l - 1} < P(0 < ITfcl < u) < C(r(l - , 

where k' = [log^ uj\. 

Recall that the probability of survival in T is P (Ylk>o l"^l ~ ^ (0; !)• Essentially, 

the preceding lemma states that given the branching process survives for the first k gen¬ 
erations, the probability that |7fc| < w decays exponentially in k (provided that ui is small 
enough with respect to k). The final and principal result of this section is to prove a 
corresponding bound with d^{v) in place of |7fc|. 

Proposition 6.5. For every v G [n], k < 0.99 log^ n and log^ n < uj < 

P(0 < d^ (v) < uj) = (1 + o(l))P(0 < |7fc| < w) -I- 0(n~^) . 

Proof. Write Zk = \Tk\. We first prove an upper bound on P(0 < d^{v) < uj). By 
Lemma 6.3, 

P(0<4(u)<cu)= ^ ¥(T<k{D,v)^T') 

{T':|r'|G(0,a;)} 

q ( lyWiiiA 

< X] ® ^ " h{T<k = T') + ¥{d^^{v)>uo\d^{v)e{Q,oj)) 

{T':|T'|e(0.u.) 

|T'|<nl/3} 

< (1 -|- o(l))P(0 < Zk < cu) + 0(n , 

where in the last inequality we used Lemma 6.2. 

We now turn to the lower bound. A similar argument to that above gives 

k 

P(0 < dfc (u) < w) > (1 + o(l))P(0 < Zk < uj) — P(^ Zj > uj"^, Zk < uj). 

j=0 

Bounding the second probability is straightforward. First, fix j < /c and o, 6 € N. Given 
that Zj = a, by the branching property, each of the a subtrees of T rooted at a node 
in Tj survives independently with probability p := P(|T| = oo). Note that p > 0 is 
independent of n. But Zk is at least the number of such subtrees which survive, so 
P(Zfc < b \ Zj = a) < P(Bin(a,p) < b). 

Finally, if '^^^^Zj > uf^ then maxo<j<fc Zj > uJ^jik P 1). In other words, letting 
jo = inf{j : Zj > uj‘^/{k + 1)}, we must have jo < k. It follows from the preceding 
paragraph (conditioning on the value of jo < k) that 

k 

P(y~^ Zj > Zk < uj) < P(Bin(u ;^/(k + l),p) < uj) = 0{n~^), 

j=0 

the final inequality by a Chernoff bound since uj’^jifk + 1) > culog^n. The proposition 
follows. □ 

7. OUT-NEIGHBOURHOODS: TECHNICAL LEMMAS 

Recall that is the event that D contains no loop vertices. As in the statement of 
Proposition 4.2 we now fix /c € [re], let E = {d'^{v) > log^re,(j<^(u) < log'^re}, and fix a 
graph H with v € V{H) C [re] such that P(iA[A^^^(u)] = H,E,Esl) > 0. It is useful to 
write B = Nff[H^v)] note that this is a deterministic set since H is deterministic, and on 
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D[N^f^] = H we have Nj^ (v) = B. Let h = n — \V{H)\ + \B\. By the assumptions on H 
we have h > n — log^ n + log^ n. 

For any event A, write ¥^{A) = P(^ | D[N^i^] = H). The following fact describes the 
distribution of D under P^. Its proof follows from straightforward considerations and is 
omitted. 

Fact 7.1. Given that D[N^j^] = H, the conditional distribution of Din, r) is that of the 
graph D defined as follows. First, D[V{H)] = H. 

Next, independently for each w ^ V{H), let • • • ,Lw,r) be a vector chosen 

uniformly at random from the ff vectors {si,..., Sr) G (([n] \ V(H)) U BY. Then for each 
i G [r] add a directed edge from w to L^^i- 

Finally, for w G V{H), let t^ = r — \Eh{w, H)\; this is the number of edges with tail 
w and head not in H. Independently for each w G V{H), let • • • ,Lw,t,n) be 

a vector chosen uniformly at random from ([n] \ V{H)Y'^, and for each i G [tw\ add a 
directed edge from w to L^^i. 

For the remainder of the section, fix u G [n] and write N* = N^{u) \ V{H), d* = |AT|. 
We continue with a simple lemma. 

Lemma 7.2. We have F^{dl < 5 ,NYYu) n V{H) = = 0(n-3) 

Proof. If n V{H) = 0 then d^ = d^. In this case, since r > 2, it is a simple 

combinatorial exercise to check that if also ^5 < 5 then D[NfY has at least two more edges 
than vertices. For any fixed digraph D with at least two more edges than vertices and with 
no self loops, it is easily seen that P^(I 1 [A^< 5 ] = D, FV{H) = 0) = 0{n~Y- (This 

is not true for digraphs with self-loops if r = 2 ; the probability v itself is a loop vertex 
is 0{n~Y and in this case d\ = 0.) The number of isomorphism classes of digraphs with 
diameter at most 5 and maximum out-degree r is bounded, and the result follows. □ 

We next show that with high probability, each generation N* is approximately r times 
larger than the last, until j is nearly (log^ R')/ 2 . 

Lemma 7.3. Let a = inf{i > 5 : d* < r*“^-|-5}. Then {5 < a < (log^n)/ 8 ) = 0{n~Y- 

Proof. Fix j >5. If ^5 > 5 and d^i > rd* — 4 for every 5 < i < j, then by induction 
dj > + 5. We thus have 

P^(5 <a<j)<F^ (dl > 5, |J{|d*| < + 5}] < < rd* - 4). 

V i=5 ) i=5 

Now fix 5 < i < J. Condition on and recall that the random variables ■ w G 

Nf , m G [r]} are the heads of edges from vertices in N*. Reveal the values of these random 
variables one-at-a-time; say a conflict occurs if Lw^m G Nf- U V(H) or for 

a previously revealed If < rd* — 4 then at least 4 conflicts occur. 

Under P^, the random variables L^j^m are independent and uniform over ([n]\U {H))UB. 
When is revealed there are less than + |. 6 | locations that can cause a conflict, 

since so the probability of a conflict is less than (r®+^ -|- \B\)/h. The set 

■ w G N*,m G [r]} has size at most and \B\ < log^n; it follows that 

< rd* - 4) < P (Bin(r®+\ (r®+2 + |B|)/h) > 4) 

-|- log^n 
4 ) ' V h 



\ ^ C (r®® -b log^® n) 

J ~ 
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where in the last inequality we used that h > n — log^ n. For j < (log^ n)/8 we thus have 


t>h ( j* 


dl > 5, [Jd^ I < + u) < g£(!f:±i2i!!!l) = o(„-»). 


□ 


i=b 


i=b 




The third lemma of the section shows that out-neighbourhoods continue to grow rapidly 
until they reach size close to n/logn. 

Lemma 7.4. There is C" > 0 such that for all i with r* < n/ log^ n — 2 log'^ n, 


n>H 




Proof. We have 


< g-C" log2 n 


> log3n,d*+i < rd* ■ (^1 

< sup P^ < rd* • 

ae[log® n,r®+l] V 


2r^ \ 

log^ny 

2r2 
log^ n 



Condition on Nf-, and reveal the random variables {L^u^rn ■ w G N*,m G [r]} one-at- 
a-time as in the previous proof. When is revealed there are less than + |.B| 

locations that can cause a conflict, so under P^ the probability of a conflict is at most 
(r*+2 _|_ \B\)/h. If d*^i < rd* — t then at least t conflicts occur, so we obtain 






2r2 \ 
log^ n / 



< P 




y.i+2 


+ \B\ 


n 


2r^a \ 
log^n/ 


Using that r* < n/log^n — 21og'^n and that \B\ < log'^n and h > n — log'^n, it is 
straightforward to verify that ar{P^‘^ + \B\)/h < r^a/log^n. A Chernoff bound then 
gives 

P^ (^<+1 < rd* ■ (l - I d* = , 

for some constant C = C'{r). The latter inequality follows since a > log^ n . □ 


The following is an easy consequence of the preceding lemma, and concludes the section. 


Corollary 7.5. Let j* = 3 log^ log^ n -|- 5 and let I* = log^ n — log^ log^ n — 1. Then there 
are c, C > 0 such that 


P-^ (d*, > \ogln,d}. < . 

V log^ny 

Proof. If d* > log^n and > rd* ■ then d*_^_^ > log^n. Since i* — j* = 

log^ n — 4 log^ log^ n — 6 we also have 


1 - 


2r2 


log^ n 


r-i* 


w-i* 


log^ n > ( 1 — 


2r2 


log^ n 


log^n-l 


n 


.,-2^ 


' log^ n 


• log^ n > 


n 


log^ n ’ 


where we used that (1 — 6/x)^ ^ > e With c = *^^^6 , it follows from the preceding 
inequalities that if d** > log^ n but d|* < cnj log^ n then there is i G [j*,i* — 1] such that 
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d* > log^ n and d*j^^ < rd* 


1 - 


2r^ 


By Lemma 7.4, there exists some C such that 


(d*. > log? n, d*,, < ^ P^ (d* > log? n, < rd* ■ 

\ / 2 — ^ ' 

< (r -j*) 



□ 


for some constant C < C. 


8. Upper bound on the Diameter 

In this section we prove Lemma 4.1 and Proposition 4.2 from Section 4. Throughout 
the section, we fix n, i; E [n]. 

Proof of Lemma 4-L It suffices to prove the lemma assuming e < 1/10. Let k* = {rj^ + 
e)log^n, and let 6 = An easy computation shows that rjr < 4/5 for every r > 2, so 
k* < O.Olog^n. Recall the definition ko = ko{v) = minj/c : d^{v) 0 (0,log^n)}. If ko > k* 
then 0 < d~jft,{v) < log^n, so by Lemma 6.4 and Proposition 6.5 we have 

P(A:o > k*) = P(0 < < log'^n) 

< (1 + o(l))C • (r(l - A^))^‘-4i°gDogn ^ o{n-^) 

= O |^,.-(l+2<5+o(l))log,.n^ ^ 0{n-^) 

where we used that 1 + log^ (1 — A^) = —rjf^. For all i < ko we have d~ (v) < log^n, 
so if ko < k* then < A;*log^n < log^n. In this case, for n large, to have 

d^k^iv) > log^n we must have > (log^n)/2. It follows by Corollary 5.3 and a Chernoff 
bound 


P(A:o < k*,d^ko(v) > (log^n)/2) 

< sup sup sup P(d/'_^^(i;) > (log^n)/2 | (u) = q,df-k{v) = p) 

k<k* — l g<log^ n p<log® n 

<P(Bin(rn/2,21og'^n/n) > (log^n)/2) 

<g-(log’'n)/8_ 

Combining the two preceding bounds, the lemma follows. □ 

The proof of Proposition 4.2 occupies the remainder of the section. Let r = min{j > 
1 : AL*“ (u) n N^k- i'^) 7^ 0}; so in particular dist(u, v) = t + k. 

Lemma 8.1. Fix e > 0 and let t' = min{j > 1 : \N^ {u) \ V{H)\ > en/logn}. Then for 
n large, 

P'^(r >r' + l) 
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Proof. First, 

F{D[N^,iv)]=H,T>T' + l) 

= ^ = H, D[N+^,iu) \ Vm = F, r > r' + 1) 

F 

= Y,nD[N<kiv)] = H,D[NtA'^) \ V{H)] = F) 

F 

• P(t > r' + 1 I F[iV<J = H, D[N+^,{u) \ V{H)] = F). 

where the sums are over graphs F with V{F) n V{H) = 0, such that u E V{F) and such 
that, for some £ > 0, V{F) = N^{F,u) and i = min{z : |iV+(F, r)| > en/logn}. 

We now bound the hnal probability. Under such conditioning, the out-edges from 
are uniformly distributed over ([n] \ V{H)) U B. There are more than r(en/logn) such 
out-edges; to have t > t' + 1 the heads of such edges must all avoid B; so 

/ I D| \'■(Wiogn) 

P(r > r' + 1 I D[NZ^{v)] = F,F[iV|^,(u) \ V{H)] = F) < (^1 - ^j 


< 1 - 


log"^ n 


n 


erinj log n) 


the last inequality since |F| > log^n and h < n. Using that 1 — x < e ^ this gives 
P(F[iV<,] = H,t>t' + 1)< e—j;P(F[iV<,(i;)] = H,D[N+^,{u) \ V{H)] = F) 

F 

< = H). 


The result follows. 


□ 


Proof of Proposition f.2. Recall that we set I* = log^ n — log^ log^ n — 1, and the notation 
N* = N^{u) \ V{H), d* = I AT I from Section 7. Once n is large enough that i* + 1 > h 
we have 

P^(dist(rt,iV<;.(u)) > r+l,:^) =P^(dist(u,A^<;.(i;)) > t+l,N^^{u)^V{H) = 0,;^), 
and we focus on the latter probability. It is convenient to use the shorthand F = {A^< 5 (M)n 

u(F) = 0}nl^. 

Now take e E (0,c), where c is the constant from Corollary 7.5. and let t' be as in 
Lemma 8.1. Then by that lemma, 

P^(dist(R,Ar-^(u)) > r + 1,F) < P^(t' > r,F) + 


We bound the second probability by 




[t' > r, F) < P" (F > r, d** > log;^ n) + P" [d*, < log;^ n, F). 

The first term on the right is at most by Corollary 7.5. We further divide the 

second as 

P^(d** < log3 n, F) < P'^(d*. < log3 n, dg > 5) + P'^(d^ < 5, F). 

Recall that a = inf{z > 5 : d* < -|- 5}. If d** < log^ n then d^* < if also 

dg > 5 then 5 < u < j*. Lemma 7.3 then implies that the hrst probability on the right is 
0{n~^). By the definition of F and by Lemma 7.2, the second probability is also 0{n~^). 
Combining all these bounds we obtain P^(dist(u, A^<^(u)) > I* + 1,Fsl) = 0(n“^), as 
required. □ 
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9. Lower Bound on the Diameter 

In the same spirit as in the previous section, we write k* = fc*(n, e) = (r/^ — e/2) log^ n 
and I* = e) = (1 — e/2) log^ n. In order to derive a lower bound on the diameter of 
D we will show that for every e > 0 there exist u,v a [n] such that dist(u, u) > k* + £*. 
Together with the proof in Section 4, it concludes the proof of Theorem 1.1. 

Definition 9.1. For v G [n], let ki = ki{v) = min{A; : d'j£{v) > log'^n}; this is oo if 
d~^{v) < log'^n for all k. A vertex v is an e-flag (or simply a flag, if e is clear from context) 
if ki G [A;*,oo), < log’^n, and is a tree. We write F = F{e) C [n] for 

the set of e-flags. 

The condition that D[N^j^^(v)] is a tree means that along any shortest path from 
to V, at each node w there are (r — 1) possible “wrong turns” that lead to [n] \ Nfl^{v). 
We will use this when bounding TTmin in Section 10. 

The first lemma shows that whp there are no flags outside Dq = DQ{n, r), the attractive 
strongly connected component of D{n,r). 

Lemma 9.2. For every e > 0, P(T(e) \ V{Dq) / 0) = o(l). 

Proof. Fix e > 0 and write F = F{e). If Dq is attractive then with high probability every 
vertex v with max„gj„] dist(tt,u) < oo satisfies v G V{Dq). Since Dq is attractive whp [17], 
in order to show ¥(F C V (Dq)) = 1 — o(l), it suffices to show that whp, for all u G F and 
u G [n] we have dist(u, u) < oo. 

Let £^{v) be the set of digraphs T with v G V(T) and V(T) C [n] such that if 
D[N^f^_^(v)] = T then u is a flag. By the definition of a flag, all the elements of £F{v) 
are rooted at v. If D[Nf^^^(v)] = T then contains no loop vertices. It follows that 

£F{v) is precisely the set of graphs T such that (u)] =T,v £ F,Esl) > 0. 

For T £ £F{v) we thus have 

P(D[iV<,^(u)] =T,v£ F,1^) = P(D[iV<,^(u)] = T,;^). 

It follows that 


P(3u, u G [re] : u G F, dist(re,u) = oo) 

<P(Fsl) + ^ P(u G F,dist(re,u) = oo,Fsl) 

u,v^[n] 

=P(Fsl)+ j;P(dist(re,u) = oo,F^,F[iV<,^(u)]=T) 

u,v^[rL\ 

We bound the inner sum by writing 

^ P(dist(re, v) = oo, Fsl, (u)] = T) 

T^sr 

= Y, = T) ■ P(dist(re,u) = oo,F^ I F[iV<,^(u)] = T) 

T^sr 

< sup P(dist(re,u) = oo,:^ I F[iV<,^(u)] = T) ■ Y = T) 


< sup P(dist(re,u) = oo,Fsl | F[iV<. (u)] = T ), 

the final bound because a sum of probabilities of disjoint events is at most one. 

Now fix T G and write h = h{T) for the height of T (i.e., the greatest number of 
edges on a path ending at the root v). Observe that if F[A^^^^(u)] = T then ki = h{T). 
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We thus have the equality of events 

{D[N^kM =T}n{vGF} = {D[N^,^{v)] = T} = {D[N^^iv)] = T}. 

If D[N^i^{v)] = T then the event Eh = {d^(u) > log^n, < log'^n} from Proposi¬ 

tion 4.2 occurs (since in this case u is a flag), so 

{D[N^hiv)] =T} = {I?[iV< Jp)] =T}nEh, 

so P(I?[A^<^(u)] = T}, Eh, Esl) > 0. It follows by Proposition 4.2 that P(dist(u,u) = 
oOjE'sl I E[N^j^^{v)] = T) = 0(n~^). Using this bound, the result follows from the two 
preceding inequalities and the fact that P(£'sl) = 0(n^“'’) = 0(n~^). □ 

We now provide a lower bound for the probability that a fixed vertex is an e-flag. 

Lemma 9.3. Eor e > 0 sufficiently small, there is P > 0 sueh that for n large, P(u G 
E(e)) > n^-\ 

Proof. We assume n large throughout. Given a tree T, let ki{T) = inf{A; : \Th\ > 
log^n}, let A{T) be the event that \Tk*\ G [elog^ n,log^n], let B{T) be the event that 
maxj<fc* |Tj| < log® n, and C{T) be the event that fci < fc* -|- 5 log^ log n and \Tk^ \ < log® n. 
(We may view a deterministic tree as a random tree in the same way as we may view a 
constant as a random variable, so it is reasonable to call A(T), B(T) and C(T) events 
even if T is deterministic.) 

We first bound the probability that A, B and C occur for a Poisson(r) Galton-Watson 
tree T. If e is sufficiently small then by Lemma 6.4 there is a > 0 such that 

F{A{T)) > a(r(l - 

= an^/2-i, 

where we used the value of k*, that log^ (r(l — A,.)) = —rjf^ and that rjr < 1. Next, if 
B{E) occurs then let i < k* he minimal such that |7i| > log®n. In order for A(fT) to 
additionally occur the number of descendants of 7) alive at time k* must be less than 
log® n. Writing p for the survival probability of a Poisson(r) branching process, it follows 
as in the proof of Proposition 6.5 that 

F{A{T),B{T)) < P(Bin(log® n,p) < log® n) < n“® , 

the last inequality by a Ghernoff bound. 

To bound the probability of C'(T), let N = N{E) be the number of vertices in 7fc* with 
at least one descendant in 7fc*+5iog^ logni if Tk* = 0 then N = 0. If C'(T) does not occur 
then one of the following must occur. 

(a) N < log^ re. 

(b) N > log^ re but ki > k* + 5 log^ log re. 

(c) \Tki \ > log® re. 

If A(T) occurs then \Tk* \ > e log® re, so by the branching property (i.e. the independence 
of subtrees rooted at elements of Tk *), we have 

F{A{T),N < log^re) < P(Bin(elog®re,p) < log^ re) < re“® 

for large re, by a Ghernoff bound. Next, to have ki > k* + 51og^logre, every vertex in 
Tjf must have fewer than log^re descendants in 7fc«+5iog^ logni so by Lemma 6.4 and the 
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branching property we have 

P(iV > log^n,fci > k* + 51og^logn) < (P(|7^iog,iogn| e (0,log"^ 

<n~^ 

for large n, the last inequality because r(l — A^) < 1. Finally, by the Markov property and 
the definition of ki, writing Po(t) for a Poisson(t) random variable, we have 

P(|7fci| > log^n) < sup P(Po(rm) > log^ n | Po (rm) > log^ n) 

m<log'* n 

< P(Po(r log^ n) > log^ n — log^ n). 

Standard estimates for the Poisson upper tail (see, e.g., [27], Lemma 1.2) then yield 
P(|7fci| > log^n) < n~^. Combining these bounds, we obtain that, for n large, 

F{A{T),B{T),C{f)) < ^Wn I MT),B{T)) < 3n-3. 

Combining inequalities, and choosing /3 > 0 appropriately, yields 
F{A{T),B{T),C{r)) > an"/2-i _4^-3 > 

Now, if A{T),B{T) and C'(T) all occur then |7 <a:i| < A;* log® n < log^n, so we may use 
Lemma 6.3 to transfer our bound from the Poisson(r) Galton-Watson tree to the tree 
r^^i(u). We obtain 

p(^(r<,^(u)),i?(r<,^(u)),c(r<,^(u))) = + p(^(r),i?(r),c(r)) 

> 2n^-^. 

Given that A{T^f^_^{v)), B{T^f^^{v)) and C{T^f^_^{v) all occur, in order to have v € F(e) it 
is sufficient that is a tree, i.e. that D[N^f^^{v)] = 

Finally, when A{T^j^^{v)), B{T^,^^{v)) and C{T^^^{v)) all occur we have |P(r<^^(i;))| < 
log^n. By Gorollary 5.2, in this case for each element u € P(r<^^(u)), the probability 
that there is a non-tree edge u to V{T^f,^{y)) is at most (rlog'^n)/n. It follows that 

The result follows. □ 


The next lemma is our key tool for controlling joint probabilities of in-neighbourhoods 
of distinct vertices. 


Lemma 9.4. Fix u,v €z [n] and trees T 
V{T) U V{T') C [n] and V{T) n V{T') = ( 

FiD[N-^{u)] = T,D[N-^,iv)] = T') = 


, T', with roots u and v, respectively, and with 
3. Then 

( \V{T)\^ , \y{T')? \ \ 

"V \n-\V{T')C n-\V{T)\)) 

• F{D[N-^{u)] = T) ■ P(Z)[iV-,,(u)] = T'). 


Proof. Recall that Ti is the f-th generation of tree T. Write h and h' for the respective 
heights of T and T', and t and t' for their respective sizes. In order that D[N^f^{u)] = T, 
it is necessary and sufficient that the following events occur. 
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• For each x € V{T) \ {tt}, there is an edge from x to pt{x) in D] call this event 
Ai{u,T). 

• There are no other edges within D\y{T)]] call this event A 2 {u,T). 

• There are no edges from [n] \ D\y{T)] to V{T) \ T^] call this event A 2 ,{u,T). 

Note that ^3 is independent of Ai and yl 2 , so we have 

nD[N<h{y^)]=T)=nAi{u,T),A2{u,T)) ■ ^ ^ • (9-1) 

We now consider two such events simultaneously. Observe that if T' has root v and height 
h' , and V{T')f^V{T) = 0 , then Ai{u,T)n A 2 {u,T) is independent of A\{v,T')n A 2 {v,T'). 
We thus have 

Pp[iV<,(n)]=r,D[iV<,,(n)]=r') 

=F{A,{u, T),A 2 {u, T)) • r),A 2 {v, T')) 

■nA^{u,T),A^{v,T')\Ai{u,T),A2{u,T),Ar{v,T'),A2{v,T')). (9.2) 

Given that Ai{u,T) and A 2 {u,T) occur, there are precisely l + (r —l)t edges leaving V{T), 
and the heads of these edges are uniformly distributed over [re] \ V(T). The conditional 
probability no such edges have head in V (T') \ U' is 

re-t-t' + 

re — t ) 

Similar considerations for edges leaving V{T') and edges with tail in [re] \ {V{T) U V{T')) 
yield the identity 

F{A^{u,T),A^{v,T') I Ai{u,T),A2{u,T),A^{v,T'),A2{v,T')) 

n — t J \ n — f J \ re 

Combined with (9.1) and (9.2), straightforward arguments give 
p(o|JVjj(,.)l = r,D|jv;^,(„)] = r') 

" ° + S) ) = ’’) ■ «"(o["A'(*’)l = T). □ 

Corollary 9.5. For distinct re,re G [re] we have P(re,re G T) < (1 + o(l))(P(re G T) + 
log^® re/re)P(re G F). 

Proof. We first divide according to whether or not fl is empty: 

P(re, re G F) 

=P(re,re G F,A^<^^(^)(re) niV<^^(^)(re) / 0) +P(re,re G F, 7V<^^(^)(re) 0 iV-^^(^)(re) = 0). 

We start with the first term on the right. If re G F then (re))| < log^re, so by 

symmetry 

P(re G F,re G iV<,^(,)(p)) < ^ • P(^; G F). 

Next, by conditioning on we have 

e F,ii ^ („)(“) ,„,(!)) # 0 ) 

< E T n V(T) ^ 01 D[JV;j,(„|(..)| = r). 

{TeT(v)-. 

u^V{T)} 
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In order to bound the final probability, first fix T as in the supremum and suppose that 
= T. Consider the iBFS procedure starting from u. Recall that at step i, 

R~ is the set of explored vertices and S~ is the set of discovered vertices. Let io = min{i : 
|R“US'“| >log^n}. If u G F then <log^n, so to have = 0 

it suffices that U fl V{T) = 0. Since 11^(1")! < log^n, by Lemma 5.5 we thus 

have 

P(» e F.nnr) ^ «I ^ T) < . 

Together with the two preceding displayed equations, for n large this gives 
P(n,u € / 0) 

r(log'^ n + 1) log'^ n 


log'^ n 

<— -P(u € F) + 

n 

log^® n 

<— -- P(u G F), 

n 


E P(o[Jvj,.,„,Wl = r). 


{TeT(v):u^V{T)} 


n — 2 log' n 


(9.3) 


the last inequality since 

E P(0|"A.(.)(”)l = T) = P(» € F.u ^ "A.(.)(>’)l) ■ 

{Tgr('u):n^V(T)} 

We now turn to the case that and are disjoint. We have 

P(ii,i. e FAn, ,„,(«) n Fn,,,,(») = 0) 

^ p(o[iv^F.(.)(“)l = r,D[JV;,_,„)WI = T'). 

{(T,T')eTiu)xriv): 

V{T)nV{T')=0} 

log^"^ n 


= 1 + 0 


„ E noA<Y,„,(-)l = r)-P(D[jv;^,,„,WI = r'). 

{(T,T')eT{u)xriv): 
y(T)ny(r')=0} 

the last line by Lemma 9.4. (Although ki{u) and ki{v) are random, by the same argument 
as in Lemma 9.2 we may replace them by the deterministic values h{T) and h{T') without 
affecting the probability, so Lemma 9.4 indeed applies.) Summing over all pairs {T,T') G 
T{u) X T{v) gives an upper bound, so we obtain 

P(u,u G F,Ar<^^(^^(u) n A^<^^(^)(u) = 0) < (1 + o(l))P(u G F)P(u G F). 

Together with (9.3) this completes the proof. □ 

Corollary 9.6. For all e > 0, P(F(e) = 0) = o(l). 


Proof. By Lemma 9.3 and linearity of expectation there is /? > 0 such that E(|F|) = 
nP(l G F) > . Next, by Corollary 9.5, for n large we have 

E(|F|2) = P(u,u G F) 

u,v£[n] 

= n(n — 1)P(1, 2 G F) + nP(l G F) 

< (1 + o(l))n(n — 1)P(1 G F)(P(2 G F) + log^® n/n) + nP(l G F) 

< (1 + o(l)(nP(l G F))^ + (nlog^®n)P(l G F) 

= (1 + o(l))(nP(l G F)f. 

The result follows by Chebyshev’s inequality. □ 
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Proof of the lower bound in Theorem 1.1. Fix e > 0 and write k* = {r]r — e/2) log^ n. Sup¬ 
pose that Dq is attractive, that |-Do| > n/2, and that -F(e) C Dq. Suppose also that for 
all w G [n] and j > 0, d^j{w) < (r -|- e)-' log^ n. Under these assumptions, if u € F{e) then 
V ^ Dq. Furthermore, < log^n so for all j > 0, 

^ (^ + e)^ login. 

Writing jo = inf{/ : y{Do) C it follows that (r-|-e)'^° log® n > n/2. Provided e 

is chosen small enough, for n large this implies that jo > (1 — 3e/2) log^ n, so there is some 
node tt € U (Dq) with dist(u, u) > A;* -|- (1 — 3e/2) log^ n = (1 -|- — 2e) log^ n. Altogether, 

this yields 

P(diam(U)o) < (1 + — 2e) log^ n) 

<P(Z)o is not attractive) +P(|U(Z)o)| < n/2) 

+ P(F(e) = 0) + P(3u € F(e) \ Dq) + P(3t(; € [n], j > 0 : dj(w) > {r + e)^ log^ n) 

The hrst two probabilities were shown to tend to 0 in [17]. The third tends to 0 by 
Corollary 9.6, the fourth by Lemma 9.2, and the last by Proposition 6.1. As e > 0 was 
arbitrarily small, the lower bound on diam(Ilo) follows; since diam(L)) > diam(Zlo) so 
does the lower bound on diam(Zl). □ 

10. The Stationary Distribution 

In this section we prove Theorem 1.2. Recall that Dq = Do(n, r) is the largest strongly 
connected component of D and that with high probability Dq is attractive [17] and er- 
godic [4]. Write vr^ax = vrmax(-C)o) and Tmin = nrr,\-n(Dn)- Also, write X = {Xk,k > 0) for 
simple random walk on D = D{n, r). 

It is important to distinguish the randomness of the graph D from that of the walk X. 
For u G U {D) = [n], write for the (random) probability measure under which X has the 
law of simple random walk on D with Xq = v, and for the corresponding expectation 
operator. It is handy to have a concrete description of X under P^,, as follows. Recall 
that D has edges {{i,Lij), {i,j) G [n] x [r]} (this is the “canonical construction” from the 
introduction). Let {Uk, k > 0) he independent and uniformly distributed over {1,... ,r}. 
Then set Xq = v and for fe > 0 let Xk+i = Lxf,,Uk- 

10.1. Bounding TTmax- Fix fc > 1 and view D[N^f^(v)] as a maze, which a random walk 
attempting to reach v must navigate. The maze entrances are the elements of N/f (u), and 
the treasure lies at v. Suppose that the random walk follows an edge e from to 

its complement. After following the edge, the random walk’s position has distance greater 
than k from v. Since the distance to v decreases by at most one in a single random walk 
step, this means that in order to reach v after leaving the random walk must pass 

through N/f (v): it must restart at one of the maze entrances. 

With the preceding paragraph in mind, for positive integer h we say that D[N^i^(v)] is 
h-hard if for every directed path P from N^{v) to v within D[N^i^(v)], we have 

#{u€V{P),\E{u,N^,{v))\ = l}>h. 

Perhaps more picturesque: the maze is /i-hard if no matter what entrance is chosen, along 
any potential path to the treasure there are at least h locations where only a single direction 
stays within the maze; the other (r — 1) possibilities deposit the searcher outside of the 
maze walls. 

For S C [n] let ts = inf{A: > 0 : G S'} and let = inf{A: > 0 : X^ G S}. 
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Lemma 10.1. For k >1, if D[N^i,{v)] is h-hard then 

1 


7r{v) < 


rh . P 




Proof. If the maze is /i-hard then from any u G Nf^ (v) 


Tv < T, 


< r 


—h 


To see this, simply note that in order to have Tv < T^n]\N~ (v) walk must visit at 
least h vertices w G with \E{w, N^j^{v))\ = 1. But for such a vertex w we have 

Xi G = 1/t and the inequality follows by the Markov property, 

that 

= E^, ) > E^, fr, 


We now use that 
1 


7r{v) 




^[n]\N-pv) ^ 


Let K be the number of visits to Nj^ (v) before the walk visits v. Since the inequality (10.1) 
holds for all u G NP(v), it follows that for all w G [r-]\1VW(u) we have E^ (r^) > (K) > 


r". Therefore 


E 




<rj 


V \ 'v I ' [nl\IV7, (v) — ‘v — 


> 


inf Eu, (r„) > r^ , 


and the result follows. 


□ 


Lemma 10.2. Fix (5 > 0 and let £* = {1 — 5) log^ n. Then 

=0(0 • 

Proof. Choose a > 0 small, and let A be the event that for all A: > 0 and all v G [n] we 
have d~jf{v) < (r + log^ n. By Proposition 6.1, we have P(^) = 0(n~'^). Assuming a is 
small enough with respect to 5, we also have 

t 

l^<£*(^)l = = ^d]:(v} < (r + log^ n < , 

fc=o 
so 

P(|lV<^*(u)| > | ^) + p(]4) = 0{n-^) . □ 

Proposition 10.3. Fix <5 > 0 and let £* = {1 — 6) log^ n and h = {1 — 26) log^ n. Then 

Pj Pi D[N^^,{v)] is h-hard\ = 1 — 0{n~^). 

\i;G[n] / 

Proof. Recall from the introduction that D = D{n, r) has edges {(i, Lij) : {i,j) G [n] x [r]}. 
Fix u G [n]. For each A: > 1 let = T^^{D,v) be the iBFS tree of Zl[A^^^(u)] described 
in Section 3. For tc G [n], tc 7 ^ u let Y{w) = \E{w, Nfi^,{v))\ — 1, and let Y{v) = 
\E{v, (i’))|- Observe that there may be multiple edges from tc to a vertex u G N~^{w), 

so Y{w) may not equal |iV+(t(;) fl lV<^*(i;)| — 1. 

For w V, the parent of w in T-* lies in N+{w) D A^<£*_]^(u) C N^^,{v), so Y{w) > 0 
for all w G N^^,(v). The key insight of the proof is that if D[N^^t{v)] is not /i-hard then 
there is some simple path P in D[NiP^t.(v)] from Nfi(v) to v along which at least |P| — h 
vertices w have Y{w) > 0. To show that P(D[W^^* (u)] is not /i-hard) is small it thus 
suffices to show that with high probability no such path exists. 
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The sets {T(t(;) : w € (u)} are conditionally independent given T < i*. Further¬ 
more, given T-., by Corollary 5.2 we also have ¥( 10 ) ^ Bin(r — 1, for all 

w E N~^,{v). 

Let A be the event that |iV<£«('u)| < For S C [n], it follows that on A the 

random variable B{S) = |{w; E S' : T (w) > 0}| is stochastically dominated by Bin(|S|, (r — 
Furthermore, by Lemma 10.2 we have P(A) = 1 — 0{n~^). 

We would like to conclude as follows. Let S be any path from (the last generation 
of Tj£») to V. The arguments of the preceding paragraphs suggest the bound 

F{B{S) >\S\-h \ A)< P(Bin(|S|,rn-'5/2^ > |S| - h) < 2l^l(rra-‘^/2ys|-ft^ 

On A we have |r^L| < |TJ^« | < so there are less than m ■ r* paths of length t from 

to V. Now use the preceding inequality and a union bound over paths of length t and 
over t > i*. 

To make the preceding argument rigorous, we need to deal with the fact that the set of 
paths from to v are random (even conditional on , as such paths may follow edges of 
D[N^gt,{v)] which are not edges of T^)- To do so, condition on T^^*, fix re E = N^{v) 
and a string s = S 1 S 2 ... st E [r]* of length |s| = t. This string uniquely specifies a path 
P = P{w, s) = {pi{w, s),0 < i < t) in D: at step i follow the Sj-th edge leaving the current 
vertex. Formally, we let po = w and, for 1 < i < t, let pi = Lp^_^^si■ 

We reveal the path P edge-by-edge, starting from w. By the independence of the 
sets Y{u), for each 0 < i < t, given that the sub-path pQ,...,pi is simple (in partic¬ 
ular Pi ^ {pq, ... ,pi-i}) then Y{pi) is conditionally independent of po,... ,pi-i and of 
Y {po),... ,Y (pi-i). It follows that 

¥{Y{pi) > 0 I T<i*,A, {po,... ,pi) simple path in (u)], {Y{pj),j < i)) < rrT^I’^ . 

By repeated conditioning, we obtain 

¥{P{w,s) is a simple path in D[N^^,{v)], B{P{w, s)) >t — h \ T<£*,A) 
<P(Bin(t , rn >t-h) 

<2\rn-^/y-’^ . 


Now let Ayj(t) be the event that there is a simple path P of length t starting from w and 
staying within D[N^^t,{v)], for which B{P) >t — h. All possible such paths are described 
by a string s E [r]*, so by the preceding inequality and a union bound, 

r(A.(t) I r<,., A) < (2r)‘(rn-*/2)i-<. < ^ 

Since i* — h> dlogn, this yields that 


/ \ 

U U A^(t) I r<,*,A 

\weN~ (v) y 


< n 


(2r^) 


2 \e* 


n5(e*-h)l2 




Since P(A) = 1 — 0{n this bound also holds unconditionally. But, as described in the 
first two paragraphs of the proof. 


(u)] is not /i-hard} C U U Au,{t ), 

wGN- {v) 


SO P(Z)[AI^^,(ti)] is not /i-hard) = 0(n ^). A union bound over v E [re] completes the 
proof. □ 
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Before proving our bounds on Tmax we require one final result, which says that with 
high probability there is at least one escape route along each path from -^iogiogn(^) 
for all V. 

Lemma 10.4. For v G [n] let be the event that each path from -^iogiogn(^) ^ contains 
at least one vertex w with |A^'''(u;) fl N^^,{v)\ = 1. Then 


P I fl 

k i;e[n] 


= 1 - 0{n-^) 


Note that the event is not the event that is 1-hard: in Ey we 

require the vertex w to send the searcher not outside of but rather out of 

the larger maze D[N^gt{v)]. The proof of Lemma 10.4 follows the same lines as that of 
Proposition 10.3 but is simpler, and is omitted. 

Theorem 10.5. For every e > 0, with high probability we have 

1 


1 

- < 7r„ 


< 


^ ''max ^ 1 , 

n 


Proof. The lower bound holds deterministically since X]j;G[n] ^('^) ~ 

To prove the upper bound, fix u G [n] and 6 G (0, e/2). Write i* = {1 — 5) log^ n, and 
T<i* = T<£*{D,v). 

First, note that if N+{v) \ / 0 then (^Xi 0 ^<iogiogn(^)) ^ V^- We 

have 

<P(l"Jlogl„g„(»')l > + *■("*(«>) \ "iloslog„(») = »I AAslojJOl < 

=0{n-^) + , 

the first by Proposition 6.1 and the second by Corollary 5.2. 

Next, if the event Ey from Lemma 10.4 occurs then for all w ^ -^<iogiogn(^) 

have (v) — Markov property, it follows that if N^{v) \ 

^iiogiognM 0 and Ey, then 

-)>lp.(Mf'A';,,,,.,,!..)) >4. 


[n]\N, 


< 


-2 


By the preceding paragraph and Lemma 10.4, we thus have P^, — T; j ^ ^ 

with P-probability 1 — 0(n“^/^). 

Finally, if in addition D[N^^,{v)] is /i-hard then by Lemma 10.1 we obtain that 7r{v) < 
j,/i -2 < (1 -)- o(l))n“^^“^^^. By Proposition 10.3 we thus have 7r{v) < (1 + o(l))n“^^“^'^) 
with probability 1 — A union bound over v G [n] then completes the proof. □ 

10.2. Bounding TTmin. We bound Tirain from below using the following lemma. 

Lemma 10.6. Let D be any r-out regular digraph. If D is ergodic and has diameter 
diam{D) < d, then 

1 


TTmin > 


1 -|- dr^ 


Proof. Fix V G V{D). For any k G [d] and u G (D,v), let K{u,k) > 1 be the number 
of directed paths of length k from u to v. Observe that the probability of following each 
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such path is precisely r since D is r-out regular. Furthermore, since tt is stationary, it 
satisfies 

K{u,k) 7r(u) 

7r(v) > ^ 7r(u) • —^ ^ ^ . 

uGN^ (v) 

By averaging over k € [d] we have 


vr 


Tr{u) 


> 


E - 

uGN^ (v) 

the last inequality since diam(L)) < d so IJfc=i 


1 — 7r(v) 

= V{D) \ {u}. The lemma follows. 

□ 


Theorem 10.7. For every e > 0 we have 

1 / _ / 1 

“ "^min < i 

with high probability. 


Proof. Fix e > 0 small. It is a straightforward consequence of Theorem 1.1 and Lemma 10.6 
that TTmin > with high probability. It remains to show that vimin is small with 

high probability. 

Let k* = {rjr — e/2) log^ n, and recall from Section 9 the dehnition of the set F = F(e) 
of e-flags. In particular, if u G F{e) then D[Nf^k*{v)] is a tree. It follows that if u G F(e) 
then D[N^j^,{v)] is A;*-hard. 

Let A be the event that vimax > By Corollary 9.6, F{F = 0) = o(l), and by 

Theorem 10.5, P(^) = o(l). Therefore 

T(vrmin > = P( 7 rn,in > F ^ 0 , ^) + o(l). ( 10 . 2 ) 

Fix V G [n], and for u G [n] let K{u) be the number of paths of length k* from u to v. 
Using the stationarity of vr we have 

r ^ K{u) 

= E ■ 

n€[n] 


If u is a flag then D[N^j^,(v)] is a tree, K{u) = 0 for u ^ Nj^,{v) and K{u) = 1 for 
u G Nif,{v). In this case we also have |iV/l(u)| < log^n. Finally, on A we have 7r{u) < 
j 7 ,-(i-e/ 6 )_ the event {u G F} n we thus obtain the bound 


7r(u) < |lV.*(u)| 


n 


-(I-e/6 ) < 


log^ n 


7^1+»?r—2e/3 


< 


1 


l+Vr—e 


n 


In other words, on A, every vertex v € F deterministically satisfies 7 r(u) < so 

in this case if F is non-empty then vrmin < It follows that the probability on 

the right of ( 10 . 2 ) is zero, so P( 7 rinin > n-P+Vr-e)'^ ^ 

as required. □ 


Proof of Theorem 1.2. The theorem is now an immediate consequence of Theorems 10.5 
and 10.7. □ 
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